Discovering Author Groups using a B-compact graph-based Clustering

نویسندگان

  • Yasmany García-Mondeja
  • Daniel Castro-Castro
  • Vania Lavielle-Castro
  • Rafael Muñoz
چکیده

Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers. Clustering documents according to the linguistic style of the authors who wrote them has been a task little studied by the research community. In order to address this problem, PAN Evaluation Framework has become the first effort to promote the development of the author clustering. This article proposes a graph-based method, specifically βcompact clustering, for discovering the groups of documents written by the same author. The β-compact algorithm is based on the analysis of the similarity between documents and they belong to the same group as long as the similarity between them exceeds the threshold β and it is the maximum similarity with respect to other documents. In our proposal we evaluated different linguistic features and similarity measures presented in previous works of authorship analysis task. The training dataset was used to determine the best value of β parameter for each language. The result of the experiments was encouraging.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Oil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)

Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

A New Clustering Approach for Symbolic Data: Algorithms and Application to Healthcare Data

Graph coloring is used to characterize some properties of graphs. A b-coloring of a graph G (using colors 1,2,...,k) is a coloring of the vertices of G such that (i) two neighbors have different colors (proper coloring) and (ii) for each color class there exists a dominating vertex which is adjacent to all other k-1 color classes. In this paper, we build on b-coloring of a graph to propose a ne...

متن کامل

Automatic Multimedia Knowledge Discovery, Summarization and Evaluation

This paper presents novel methods for automatically discovering, summarizing and evaluating multimedia knowledge from annotated images in the form of images clusters, word senses and relationships among them, among others. These are essential for applications to intelligently, efficiently and coherently deal with multimedia. The proposed methods include automatic techniques (1) for constructing...

متن کامل

DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree

3:00 PM Session 1 DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree Nurul F. Zulkurnain, David J. Haglin and John A. Keane Triangular Kernel Nearest-Neighbor-based Clustering Algorithm for Discovering True Clusters Aina Musdholifah and Siti Zaiton Mohd Hashim An Improved Genetic Clustering Algorithm for Categorical Data Hongwu Qin, Xiuqin Ma, Tutut Herawan, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017